UC Davis DataLab Install Guide
Overview
This toolkit will walk you through installing some of the most common data science tools used at the UC Davis DataLab and beyond. Use the sidebar on the left to navigate to the specific program you would like to install. If you are taking a class or workshop using these tools, please install them prior to attending.
Anaconda (Python)
Introduction
Anaconda on Windows
Anaconda on Mac
Verifying your install
Installation troubleshooting
If you are not able to successfully install Anaconda on your own, please attend DataLab’s Virtual Office Hours. Click here for more information and to receive a Zoom link.
DBeaver (SQL Database)
Introduction
DBeaver on Windows
DBeaver on Mac
Verifying your install
Installation troubleshooting
If you are not able to successfully install DBeaver on your own, please attend DataLab’s Virtual Office Hours. Click here for more information and to receive a Zoom link.
Git
Introduction
Git is a ubiquitous software for version control. Version control describes a process of storing and organizing multiple versions (or copies) of documents that you create. Approaches to version control range from simple to complex and can involve the use of various human workflows and/or software applications to accomplish the overall goal of storing and managing multiple versions of the same document(s).
Most people have a folder/directory somewhere on their computer that looks something like this:
Or perhaps, this:
This is a rudimentary form of version control that relies completely on the human workflow of saving multiple versions of a file. This system works minimally well, in that it does provide you with a history of file versions theoretically organized by their time sequence. But this filesystem method provides no information about how the file has changed from version to version, why you might have saved a particular version, or specifically how the various versions are related. This human-managed filesystem approach is more subject to error than software-assisted version control systems. It is not uncommon for users to make mistakes when naming file versions, or to go back and edit files out of sequence. Software-assisted version control systems such as Git were designed to solve this problem.
If you would like more information on how Git works, read more in the Git Book for free.
Git on Windows
Follow these step-by-step instructions if you’re installing Git on a Windows machine:
First, launch a web browser, the image below shows the Microsoft Edge browser:
Next, navigate to the following Git download URL in your browser https://git-scm/com/downloads:
Select “Windows” from the Downloads portion of the Git webpage. Git will display the following page and automatically being downloading the correct version of the Git software. If the download doesn’t start automatically, click on the “Click here to download manually link”:
When the download is complete, open/Run the downloaded file (will look different in different browsers, but everyone should know how to do this):
A screen will appear asking for permissions for the Git application to make changes to your device. Click on the Yes button:
Click Next to accept the user license:
Leave the default “Destination Location” unchanged (usually C:\Program Files\Git) and hit Next
You will see a screen like the one below asking you to “Select Components”:
Leave all of the default components selected and also check the boxes next to “Additional Icons” and it’s sub-item, “On the Desktop.” Your completed configurations window should have the following components selected:
Additional Icons
-> On the Desktop
Windows Explorer integration
-> Git Bash Here
-> Git GUI Here
Git LFS (Large File Support)
Associate .git* configuration files with default text editor
Associate .sh files to be run with Bash
And should look like this:
After verifying that you have the necessary components selected as per above, hit Next.
The next screen will ask you to “Select a Start Menu Folder.” Keep the default value of Git and hit Next:
Leave the default “Use Vim (the ubiquitous text editor) as Git’s default editor” selected on the “Choosing the default editor used by Git” screen and hit Next:
On the next screen, leave the default “let Git decide” option selected and hit Next:
Leave the default “Git from the command line and also from 3rd-party software” selected and hit Next:
On the next “Choosing HTTPS transport backend” page leave the default “Use the OpenSSL library” option selected and hit Next:
Leave the default “Checkout Windows-style, commit Unix-style line endings” selected on the next page and hit Next:
Keep the default “Use MinTTY (the default terminal of MSYS2)” selected on the “Configuring the terminal emulator to use with Git Bash” window and hit Next:
Keep the default value of “Default (fast-forward or merge)” on the “Choose the default behavior of ‘git pull’” page and hit Next:
Keep the default value of “Git Credential Manager Core” on the “Choose a credential helper” page and hit Next:
Keep the default values on the “Configuration extra options” page by keeping “Enable file system caching” checked and “Enable symbolic links” unchecked and then hit Next:
Make sure that no options are checked in the “Configuring experimental options” page and hit Install:
After you hit this Install button as per above, you will see an install progress screen like the one below:
When the install is complete, a new, “Completing the Git Setup Wizard” window like the one below will appear:
Make sure that all of the options on this window are unchecked as in the image below and then hit the Finish button:
This will complete your installation process.
Windows users should verify that when downloading Git for Windows they have also installed Git Bash, which is necessary for working with Git in command line.
Git on Mac
If you are installing Git on a Mac, there is no extra configuration. Simply go the git download page at https://git-scm.com/downloads and choose the latest version for mac, and run the installer package when it is finished downloading. If you get an “unknown developer” warning during the install process, follow the instructions at the beginning of the video at https://www.youtube.com/watch?v=__kr-Ew5kbE to help you work through this problem.
Verifying your install
Whether you’re installing on Windows or Mac, note that unlike most applications that you’ve installed before, you will not find a “Git” application in your programs or applications directory once the installation is complete. As long as you don’t get an explicit error message during the installation process, you can assume that the software was successfully installed. Git is a command-line application with which you interact using the command-line, which we’ll cover during the interactive session. If you’re already familiar with using command line, you can verify your install by opening the terminal (for Windows that will be Git Bash in your programs menu) and type git –version. You should then see a response of your installed version (e.g., git version 2.12.2.windows.2, or git version 2.12.2.mac.2), and not the error “command not found.”
Installation troubleshooting
If you are not able to successfully install Git on your own, please attend DataLab’s Virtual Office Hours. Click here for more information and to receive a Zoom link.
Jupyter Notebooks
Introduction
Jupyter on Windows
Jupyter on Mac
Verifying your install
Installation troubleshooting
If you are not able to successfully install Jupyter Notebooks on your own, please attend DataLab’s Virtual Office Hours. Click here for more information and to receive a Zoom link.
Linux Subsystem for Windows (LSW; a.k.a. Command Line)
Introduction
LSW on Windows
Verifying your install
Installation troubleshooting
If you are not able to successfully install the Linux Subsystem for Windows on your own, please attend DataLab’s Virtual Office Hours. Click here for more information and to receive a Zoom link.
OpenRefine
Introduction
OpenRefine is an open source tool used to clean and pre-process messy data. While most people are familiar with data cleaning in their coding tool of choice (R, Python, Julia, etc.), OpenRefine is designed to provide powerful cleaning capabilities with minimal overhead.
OpenRefine on Windows
Open your web browser of choice and navigate to the OpenRefine homepage at https://openrefine.org/. Click on the download button in the left sidebar.
On the download page, scroll to the latest version of OpenRefine and select the Windows kit. If you are unsure if you have Java installed on your system, choose the Windows kit with embedded Java instead.
Once the download has completed, open the zip and move the contents to a convenient location on your computer.
Open the resulting directory, and double click on the openrefine.exe executable.
The OpenRefine executable will start a terminal window, and shortly after launch a tab in your default web browser with the OpenRefine interface.
OpenRefine on Mac
Verifying your install
TODO - need data
Installation troubleshooting
If you are not able to successfully install OpenRefine on your own, please attend DataLab’s Virtual Office Hours. Click here for more information and to receive a Zoom link.
R/RStudio
Introduction
“R” is both a free and open source programming language designed for statistical computing and graphics, and the software for interpreting the code written in the R language. RStudio is an integrative development environment (IDE) within which you can write and execute code, and interact with the R software. It’s an interface for working with the R software that allows you to see your code, plots, variables, etc. all on one screen. This functionality can help you work with R, connect it with other tools, and manage your workspace and projects. You cannot run RStudio without having R installed. While RStudio is a commercial product, the free version is sufficient for most researchers.
Why learn R? There are many advantages to working with R.
- Scientific integrity. Working with a scripting language like R facilitates reproducible research. Having the commands for an analysis captured in code promotes transparency and reproducibility. Someone using your code and data should be able to exactly reproduce your analyses. An increasing number of research journals not only encourage, but are beginning to require, submission of code along with a manuscript.
- Many data types and sizes. R was designed for statistical computing and thus incorporates many data structures and types to facilitate analyses. It can also connect to local and cloud databases.
- Graphics. R has built-in plotting functionalities that allow you to adjust any aspect of your graph to effectively tell the story of your data.
- Open and cross-platform. Because R is free, open-source software that works across many different operating systems, anyone can inspect the source code, and report and fix bugs. It is supported by a large community of users and developers.
- Interdisciplinary and extensible. Because anyone can write and share R packages, it provides a framework for integrating approaches across domains, encouraging innovation.
R/RStudio on Windows
Follow these step-by-step instructions to install R and RStudio on a Windows machine:
First open your internet browser of choice, and navigate to https://www.r-project.org/. Click on download R.
On the following page, select the link under whatever location is closest to you for the best download speed (though any will work).
Next, click the Download R for Windows link.
Click on the link base to go to the download page.
Finally, click Download R X.X.X for Windows to download the installer.
When the download is complete, run the R installer. This will look slightly different depending on your browser.
Select your language and then accept the license agreement by hitting Next >.
Leave the default install location and select Next >.
If you know what kind of machine you are on, you can specify if you want the 32 or 64 bit version of R. If you do not know, it is safe to install both.
Keep the default startup options and hit Next >.
You most likely will not want an R shortcut on your desktop, as you will almost certainly use RStudio as an interface. You can still have one if you would like. Otherwise, accept the defaults and hit Next >.
Wait for the instillation to complete.
Once it is done, hit Finish. You’ve now installed R! However, we still need to install RStudio separately.
Navigate to the RStudio homepage at https://rstudio.com/ and click the download button.
Scroll down and select the free version to download. If you are using RStudio for commercial purposes you will need to look into RStudio’s licensing terms to see if you need to pay for the pro version.
Download the RStudio installer for your machine.
Run the installer just as you did for the R download.
R/RStudio on Mac
If you are installing R/RStudio on a Mac, there is no extra configuration. Simply go the download pages for R and RStudio and choose the latest version for mac. Run the installer package when it is finished downloading. If you receive an error regarding the app being from an unidentified developer, please follow the instructions here.
Verifying your install
Once you have installed both R and RStudio, you should be able to run RStudio on your machine. You can verify your install is working by opening RStudio and typing paste("Hello World!") into the console as shown below. If the code runs you should see a response that says [1] Hello World!. If that works you are all set!
Installation troubleshooting
If you are not able to successfully install R/RStudio on your own, please attend DataLab’s Virtual Office Hours. Click here for more information and to receive a Zoom link.
Contributions
This research toolkit is maintained by the UC Davis DataLab, and is open for contribution. See how you can contribute on the Github repo.
This toolkit has been made possible thanks to contributions by:
- Carl Stahmer
- Jared Joseph